9 research outputs found

    DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning

    Full text link
    We present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand due to the number of dimensions, dependencies to past vectors, spatial/temporal correlations, and co-correlation between dimensions. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret decisions using memory reduction interactions, and to investigate the role of parts of the memory when errors have been made (e.g. wrong direction). We report on DRLViz applied in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpretability and explainability in the field of visual analytics

    How Transferable are Reasoning Patterns in VQA?

    Full text link
    Since its inception, Visual Question Answering (VQA) is notoriously known as a task, where models are prone to exploit biases in datasets to find shortcuts instead of performing high-level reasoning. Classical methods address this by removing biases from training data, or adding branches to models to detect and remove biases. In this paper, we argue that uncertainty in vision is a dominating factor preventing the successful learning of reasoning in vision and language problems. We train a visual oracle and in a large scale study provide experimental evidence that it is much less prone to exploiting spurious dataset biases compared to standard models. We propose to study the attention mechanisms at work in the visual oracle and compare them with a SOTA Transformer-based model. We provide an in-depth analysis and visualizations of reasoning patterns obtained with an online visualization tool which we make publicly available (https://reasoningpatterns.github.io). We exploit these insights by transferring reasoning patterns from the oracle to a SOTA Transformer-based VQA model taking standard noisy visual inputs via fine-tuning. In experiments we report higher overall accuracy, as well as accuracy on infrequent answers for each question type, which provides evidence for improved generalization and a decrease of the dependency on dataset biases

    Théo Guesser

    No full text
    International audienc

    DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning

    No full text
    International audienceWe present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand due to the number of dimensions, dependencies to past vectors, spatial/temporal correlations, and co-correlation between dimensions. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret decisions using memory reduction interactions, and to investigate the role of parts of the memory when errors have been made (e.g. wrong direction). We report on DRLViz applied in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpretability and explainability in the field of visual analytics

    RLMViz: Interpréter la Mémoire du Deep Reinforcement Learning

    No full text
    National audienceWe present RLMViz, a visual analytics interface to interpret the internal memory of an agent (e.g., a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated before each action of the agent moving in an environment. This memory is not trivial to understand, and is referred to as a black box, which only inputs (images) and outputs (actions) are understood, but not its inner workings. Using RLMViz, experts can form hypothesis on this memory and derive rules based on the agent's decisions to interpret them, and gain an understanding towards why errors have been made and improve future training process. We report on the main features of RLMViz which are memory navigation and contextualization techniques using time-lines juxtapositions. We also present our early findings using the VizDoom simulator, a standard benchmark for DRL navigation scenarios

    What if we Reduce the Memory of an Artificial Doom Player?

    No full text
    International audienc

    SIM2REALVIZ: Visualiser le Sim2Real Gap pour l'Estimation de pose de Robot

    No full text
    The Robotics community has started to heavily rely on increasingly realistic 3D simulators for large-scale training of robots on massive amounts of data. But once robots are deployed in the real-world, the simulation gap, as well as changes in the real-world (e.g. lights, objects displacements) leads to errors. In this paper, we introduce SIM2REALVIZ, a visual analytics tool to assist experts in understanding and reducing this gap for robot ego-pose estimation tasks, i. e. the estimation of a robot’s position using trained models. SIM2REALVIZ displays details of a given model and the performance of its instances in both simulation and real-world. Experts can identify environment differences that impact model predictions at a given location and explore through direct interactions with the model hypothesis to fix it. We detail the design of the tool, and case studies related to the exploit of the regression to the mean bias and how it can be addressed, and how models are perturbed by vanishing landmarks such as bikes.La communauté robotique a commencé à s'appuyer fortement sur des simulateurs 3D de plus en plus réalistes pour l'entraînement à grande échelle des robots sur des quantités massives de données. Mais une fois les robots déployés dans le monde réel, le décalage de la simulation, ainsi que les changements dans le monde réel (par exemple, les lumières, les déplacements d'objets) conduisent à des erreurs. Dans cet article, nous présentons SIM2REALVIZ, un outil d'analyse visuelle pour aider les experts à comprendre et à réduire cet écart pour les tâches d'estimation de l'ego-pose des robots, c'est-à-dire l'estimation de la position d'un robot à l'aide de modèles entraînés. SIM2REALVIZ affiche les détails d'un modèle donné et les performances de ses instances en simulation et dans le monde réel. Les experts peuvent identifier les différences d'environnement qui ont un impact sur les prédictions du modèle à un endroit donné et explorer par des interactions directes avec l'hypothèse du modèle pour la corriger. Nous détaillons la conception de l'outil, ainsi que des études de cas liées à l'exploitation du biais de régression à la moyenne et à la façon dont il peut être traité, et à la façon dont les modèles sont perturbés par des points de repère disparus tels que les vélos.Traduit avec www.DeepL.com/Translator (version gratuite

    SwimTrack: Swimmers and Stroke Rate Detection in Elite Race Videos

    No full text
    We present SwimTrack, a series of 5 multimedia tasks related to swimming video analysis from elite competition live recordings. These tasks are related to video, image, and audio analysis which may be achieved independently. But when solved altogether, they form a grand challenge to provide sport federations and coaches with novel methods to asses and enhance swimmers' performance, in particular related to stroke rate and length analysis. We share a unique collection of video footage that contains all swimming race types, recorded from a spectator point of view with variations such as lighting reflections, background clutter, noise from the motion of waves, and different point of views on swimmers. SwimTrack is the first challenge of this kind for a total of 4 swimming elite competitions. We sought to include a larger and even more diverse set of videos as well as additional mini-challenges once more recordings will be available in a next version

    VisQA: X-raying Vision and Language Reasoning in Transformers

    No full text
    International audienceVisual Question Answering systems target answering open-ended textual questions given input images. They are a testbed for learning high-level reasoning with a primary use in HCI, for instance assistance for the visually impaired. Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data, and sometimes do not even look at the input image, instead of performing the required reasoning steps. We present VisQA, a visual analytics tool thatexplores this question of reasoning vs. bias exploitation. It exposes the key element of state-of-the-art neural models --- attention maps in transformers. Our working hypothesis is that reasoning steps leading to model predictions are observable from attention distributions, which are particularly useful for visualization. The design process of VisQA was motivated by well-known bias examples from the fields of deep learning and vision-language reasoning and evaluated in two ways. First, as a result of a collaboration of three fields, machine learning, vision and language reasoning, and data analytics, the work lead to a better understanding of bias exploitation of neural models for VQA, which eventually resulted in an impact on its design and training through the proposition of a method for the transfer of reasoning patterns from an oracle model. Second, we also report on the design of VisQA, and a goal-oriented evaluation of VisQA targeting the analysis of a model decision process from multiple experts, providing evidence that it makes the inner workings of models accessible to users
    corecore